fix: AccumulateGrad stream mismatch warning when using DDP with Fabric & Trainer by deependujha · Pull Request #21746 · Lightning-AI/pytorch-lightning

deependujha · 2026-05-28T09:37:13Z

What does this PR do?

Problem

When using Fabric with DDP and gradient accumulation, the following warning
was emitted on every backward pass:

UserWarning: The AccumulateGrad node's stream does not match the stream of the
node that produced the incoming gradient. This may incur unnecessary
synchronization and break CUDA graph capture if the AccumulateGrad node's
stream is the default stream...

This warning did not appear when using plain PyTorch DDP directly.

Root Cause

The original setup_module initialized DistributedDataParallel inside a
new side-stream context (torch.cuda.Stream()):

ctx = torch.cuda.stream(torch.cuda.Stream())
with ctx:
    return DistributedDataParallel(...)

This was intentional for supporting CUDA graph whole-network capture, which
requires DDP to be initialized on a side-stream (see PyTorch docs:
https://docs.pytorch.org/docs/2.12/notes/cuda.html#id5).

However, for normal training (the vast majority of use cases), this causes a
stream mismatch: DDP registers its AccumulateGrad hooks on the side-stream
during initialization, but all subsequent forward/backward passes run on the
default stream. PyTorch detects this cross-stream node reference and emits
the warning.

Plain PyTorch DDP does not hit this because users initialize DDP directly
without any stream context, it defaults to the default stream, so there is
no mismatch.

Fix

Detect whether we are currently inside a CUDA graph capture context using
torch.cuda.is_current_stream_capturing(), and pick the appropriate stream:

Normal training (capturing=False): initialize DDP on the default
stream. No mismatch, no warning.
CUDA graph capture (capturing=True): initialize DDP on a new
side-stream as before (required by PyTorch). Additionally suppress the
AccumulateGrad warning globally since the mismatch is intentional in
this context.

This fix is applied to both DDPStrategy in lightning_fabric and
DDPStrategy in pytorch_lightning.

Testing

Verified with the reproduction script from #21567 across 4 GPUs (torch 2.11
and 2.12). Warning no longer appears under normal DDP + gradient accumulation
training.

Before submitting

Was this discussed/agreed via a GitHub issue? (not for typos and docs)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:

Reviewer checklist

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified

github-actions · 2026-05-28T09:37:39Z

⚡ Required checks status: All passing 🟢

Groups summary

🟢 pytorch_lightning: Tests workflow

Check ID	Status
pl-cpu-guardian	success	✅

These checks are required after the changes to src/lightning/fabric/strategies/ddp.py, src/lightning/pytorch/strategies/ddp.py.

🟢 fabric: Docs

Check ID	Status
docs-make (fabric, doctest)	success	✅
docs-make (fabric, html)	success	✅

These checks are required after the changes to src/lightning/fabric/strategies/ddp.py.

🟢 pytorch_lightning: Docs

Check ID	Status
docs-make (pytorch, doctest)	success	✅
docs-make (pytorch, html)	success	✅

These checks are required after the changes to src/lightning/pytorch/strategies/ddp.py.

🟢 lightning_fabric: CPU workflow

Check ID	Status
fabric-cpu-guardian	success	✅

These checks are required after the changes to src/lightning/fabric/strategies/ddp.py, tests/tests_fabric/strategies/test_ddp.py.

🟢 mypy

Check ID	Status
mypy	success	✅

These checks are required after the changes to src/lightning/fabric/strategies/ddp.py, src/lightning/pytorch/strategies/ddp.py.

🟢 install

Check ID	Status
install-pkg-guardian	success	✅

These checks are required after the changes to src/lightning/fabric/strategies/ddp.py, src/lightning/pytorch/strategies/ddp.py.

Thank you for your contribution! 💜

Note
This comment is automatically generated and updates for 70 minutes every 180 seconds. If you have any other questions, contact carmocca for help.

codecov-commenter · 2026-05-28T10:06:45Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 50.00000% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 87%. Comparing base (1120456) to head (5b7bb7c).
⚠️ Report is 1 commits behind head on master.
✅ All tests successful. No failed tests found.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@           Coverage Diff           @@
##           master   #21746   +/-   ##
=======================================
- Coverage      87%      87%   -0%     
=======================================
  Files         270      270           
  Lines       23975    23987   +12     
=======================================
+ Hits        20750    20755    +5     
- Misses       3225     3232    +7

tchaton

Nice !

…rning

update

1b550e5

deependujha requested review from ethanwharris, justusschock and tchaton as code owners May 28, 2026 09:37

deependujha added 2 commits May 28, 2026 09:38

update

94c9341

update

fba6d6d

deependujha added 5 commits May 28, 2026 10:24

update

bc04154

update

d586142

update

d0e3e42

update

85cbde1

update

12b4d26

deependujha changed the title ~~fix: AccumulateGrad stream mismatch warning when using DDP with Fabric~~ fix: AccumulateGrad stream mismatch warning when using DDP with Fabric & Trainer Jun 1, 2026

tchaton approved these changes Jun 1, 2026

View reviewed changes

Merge branch 'master' into fix/ddp-accumulate-grad-stream-mismatch-wa…

a6f5739

…rning

deependujha commented Jun 1, 2026

View reviewed changes

Comment thread src/lightning/pytorch/CHANGELOG.md Outdated

Apply suggestion from @deependujha

5b7bb7c

deependujha enabled auto-merge (squash) June 1, 2026 12:30

deependujha merged commit 35e56ef into master Jun 1, 2026
219 of 225 checks passed

deependujha deleted the fix/ddp-accumulate-grad-stream-mismatch-warning branch June 1, 2026 13:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: AccumulateGrad stream mismatch warning when using DDP with Fabric & Trainer#21746

fix: AccumulateGrad stream mismatch warning when using DDP with Fabric & Trainer#21746
deependujha merged 10 commits into
masterfrom
fix/ddp-accumulate-grad-stream-mismatch-warning

deependujha commented May 28, 2026

Uh oh!

github-actions Bot commented May 28, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented May 28, 2026 •

edited

Loading

Uh oh!

tchaton left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

deependujha commented May 28, 2026

What does this PR do?

Problem

Root Cause

Fix

Testing

PR review

Uh oh!

github-actions Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚡ Required checks status: All passing 🟢

Groups summary

Uh oh!

codecov-commenter commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

tchaton left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented May 28, 2026 •

edited

Loading

codecov-commenter commented May 28, 2026 •

edited

Loading